[WIP][Java] Use MemorySegment for index serialization by ldematte · Pull Request #1035 · rapidsai/cuvs

ldematte · 2025-06-20T14:42:31Z

NOTE: this WIP/draft uses code from #1033 and #1024 -- don't worry, I'll sort out the merge order of what we want/can merge and update accordingly.

This PR adds a new method to serialize indices (for now, only Cagra -- for demonstration purposes) to memory, instead of relying on a separate temp file. This is cleaner, saves 50% of disk space, and it's up to 10x faster.

The PR includes both the necessary Java changes and the related C API changes -- again, this draft is for discussion and to showcase how this could work; if we like the approach, my plan is to raise separate C and Java PRs (first get the C API change merged, then consume/use it in the Java API).
If we like the approach, a similar change would be needed for deserialization (CagraIndex#build/ cuvsCagraDeserialize).

Benchmarks:

Benchmark                                           (dims)  (size)   Mode  Cnt     Score    Error  Units
CagraSerializationBenchmarks.testSerializeToFile      1024     100  thrpt    5   561.699 ±  3.401  ops/s
CagraSerializationBenchmarks.testSerializeToMemory    1024     100  thrpt    5  6055.371 ± 33.982  ops/s

…nchmarks

…c Dataset benchmarks

copy-pr-bot · 2025-06-20T14:42:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ldematte · 2025-06-20T15:01:06Z

On the C side things are pretty simple: the C++ API already has an overload accepting a std::ostream; the new C API builds a very simple memory-based std::ostream using the provided pointer + size and calls down to the existing C++ API.

I'd like to refine this further; ATM the C API user (including our Java bindings) needs to "guess" a "large enough" buffer to pass to cuvsCagraSerializeToMemory.
From what I see from

cuvs/cpp/src/neighbors/detail/cagra/cagra_serialize.cuh

Line 51 in 0dd7bde

void serialize(raft::resources const& res,

and

cuvs/cpp/src/neighbors/detail/dataset_serialize.hpp

Lines 45 to 67 in a572273

    
           void serialize(const raft::resources& res, 
        
                          std::ostream& os, 
        
                          const strided_dataset<DataT, IdxT>& dataset) 
        
           { 
        
             auto n_rows = dataset.n_rows(); 
        
             auto dim    = dataset.dim(); 
        
             auto stride = dataset.stride(); 
        
             raft::serialize_scalar(res, os, n_rows); 
        
             raft::serialize_scalar(res, os, dim); 
        
             raft::serialize_scalar(res, os, stride); 
        
             // Remove padding before saving the dataset 
        
             auto src = dataset.view(); 
        
             auto dst = raft::make_host_matrix<DataT, IdxT>(n_rows, dim); 
        
             RAFT_CUDA_TRY(cudaMemcpy2DAsync(dst.data_handle(), 
        
                                             sizeof(DataT) * dim, 
        
                                             src.data_handle(), 
        
                                             sizeof(DataT) * stride, 
        
                                             sizeof(DataT) * dim, 
        
                                             n_rows, 
        
                                             cudaMemcpyDefault, 
        
                                             raft::resource::get_cuda_stream(res))); 
        
             raft::resource::sync_stream(res); 
        
             raft::serialize_mdspan(res, os, dst.view());

it is possible to know the total size of the data before even serializing it (and with no need to grab data from the GPU too).

So I was thinking we may add functions to the C and C++ API to get the serialization size. Something like

cuvsCagraIndexSerializationSize(res, index, include_dataset, &length)

Wdyt?

chatman · 2025-06-30T18:22:01Z

On the C side things are pretty simple: the C++ API already has an overload accepting a std::ostream; the new C API builds a very simple memory-based std::ostream using the provided pointer + size and calls down to the existing C++ API.

FYI, I was trying to chase this exact optimization, by doing a direct serialization. I've just opened a draft PR #1071, where I was passing an existing file, and have cuVS append the index to it. After the file has been written to, checking the file size would give us the offset.

ldematte · 2025-07-02T11:18:02Z

I was passing an existing file, and have cuVS append the index to it

I thought about that, but it looked a bit of a "hack" to me and I preferred this approach. Let me elaborate.
On the C++ side there is a nice API that uses a std::ostream; on the C side it would be difficult to have something so flexible.
To achieve our goals this can be:

a file name + offset + length
a fd + offset + length
a pointer + length (the caller can use a file, via mmap -- we can do that the Java side too)
a function pointer void (*f)(void*, int), that the C API will use any time it needs to write 1 or more bytes.

I think 4. is the closest to a ostream but it's more complex; 1 and 2 are "weird" in a API IMO; 1 would also require re-opening on the C side something that might also already (exclusively) open for write on the Java side (by ES or Lucene); 2 could avoid that but dealing with file descriptors is difficult/impossible to do in Java. Also, 1 and 2 limit the functionality of the API to files (3/4 are more generic).

That's why I landed with 3 for this draft - it was generic enough and easy to implement.
@benfred, wdyt? I'd really like your input here.
I'm open to explore alternatives, including the function pointer one if you think it could make a better C API.

…segment-serialize # Conflicts: # java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraIndex.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BruteForceIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/DatasetImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java # java/cuvs-java/src/test/java/com/nvidia/cuvs/CagraBuildAndSearchIT.java

ldematte · 2025-07-14T16:05:14Z

Closed in favour of #1085

ldematte added 11 commits June 16, 2025 17:31

Use Dataset to encapsulate on-heap arrays too

d0f53c0

Missing assert; renaming

114285a

Add benchmarks project

02d58ab

Merge remote-tracking branch 'upstream/branch-25.08' into java/add-be…

42b4bde

…nchmarks

Add benchmarks README

84ec3d9

Merge branch 'java/add-benchmarks' into java/memorysegment-dataset

1a155ce

Add MemorySegment based Dataset (0-copy) + benchmark

01ca37e

Alternative: move MemorySegment creation "up", at params build time

86d9575

Different MemorySegment lifetime management (reduced scope) + specifi…

7f91a23

…c Dataset benchmarks

C/CPP changes: add cuvsCagraSerializeToMemory

840aac5

Add serialize(MemorySegment) + test + benchmark

381fb69

ldematte requested a review from a team as a code owner June 20, 2025 14:42

github-actions bot added the cpp label Jun 20, 2025

ldematte changed the title ~~[Java] Use MemorySegment for index serialization~~ [WIP][Java] Use MemorySegment for index serialization Jun 20, 2025

Update API to report and use the number of bytes written

dfce2c2

ChrisHegarty mentioned this pull request Jun 30, 2025

Examine bumping cuvs-java to a minimum of Java 22 #1066

Closed

cjnolet assigned ldematte Jun 30, 2025

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 30, 2025

cjnolet added this to Vector Search, ML, & Data Mining Release Board Jun 30, 2025

cjnolet moved this to In Progress in Vector Search, ML, & Data Mining Release Board Jun 30, 2025

ldematte added 2 commits July 3, 2025 14:54

Fix formatting

f8500a4

ldematte mentioned this pull request Jul 4, 2025

[WIP][Java][C] Serialization via OutputStream (Java) -> function pointer (C) -> ostream (C++) #1085

Draft

ldematte closed this Jul 14, 2025

github-project-automation bot moved this from In Progress to Done in Vector Search, ML, & Data Mining Release Board Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Java] Use MemorySegment for index serialization#1035

[WIP][Java] Use MemorySegment for index serialization#1035
ldematte wants to merge 14 commits intorapidsai:branch-25.08from
ldematte:java/memorysegment-serialize

ldematte commented Jun 20, 2025

Uh oh!

copy-pr-bot bot commented Jun 20, 2025

Uh oh!

ldematte commented Jun 20, 2025

Uh oh!

chatman commented Jun 30, 2025

Uh oh!

ldematte commented Jul 2, 2025 •

edited

Loading

Uh oh!

ldematte commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ldematte commented Jun 20, 2025

Uh oh!

copy-pr-bot bot commented Jun 20, 2025

Uh oh!

ldematte commented Jun 20, 2025

Uh oh!

chatman commented Jun 30, 2025

Uh oh!

ldematte commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldematte commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ldematte commented Jul 2, 2025 •

edited

Loading